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Abstract — This paper studies network information theory 
problems where the external noise is Gaussian distributed. In 
particular, the Gaussian broadcast channel with coherent fading 
and the Gaussian interference channel are investigated. It is 
shown that in these problems, non-Gaussian code ensembles can 
achieve higher rates than the Gaussian ones. It is also shown 
that the strong Shamai-Laroia conjecture on the Gaussian ISI 
channel does not hold. In order to analyze non-Gaussian code 
ensembles over Gaussian networks, a geometrical tool using the 
Hermite polynomials is proposed. This tool provides a coordinate 
system to analyze a class of non-Gaussian input distributions that 
are invariant over Gaussian networks. 

I. Introduction 

Let a memoryless additive white Gaussian noise (AWGN) 
channel be described hy Y — X + Z, where Z ^ J^{0, v) 
is independent of X. If the input is imposed an average 
power constraint given by EX'^ < p, the input distribution 
maximizing the mutual information is Gaussian. This is due 
to the fact that under second moment constraint, the Gaussian 
distribution maximizes the entropy, hence 

arg max + Z) - 7V(0,p). (1) 

On the other hand, if we use a Gaussian input distribution, 
i.e., X ~ Af{0,p), the worst noise that can occur, i.e., the 
noise minimizing the mutual information, among noises with 
bounded second moment, is again Gaussian distributed. This 
can be shown by using the entropy power inequality (EPI), cf. 
ifTTl . which reduces in this setting to 

arg min h{X + Z) - JV{0, v) (2) 

Z: h(Z)=\ log27ret) 

and implies 

arg min h{X + Z) - h{Z) ^ J\f{Q,v). (3) 

Hence, in the single-user setting, when optimizing the mutual 
information as above, a Gaussian input is the best input for 
a Gaussian noise and a Gaussian noise is the worst noise for 
a Gaussian input. This provides a game equilibrium between 
user and nature, as defined in |7|, p. 263. With these results, 
many problems in information theory dealing with Gaussian 
noise can be solved. However, in Gaussian networks, that is, 
in multi-user information theory problems where the external 
noise is Gaussian distributed, several new phenomena make 
the search for the optimal input ensemble more complex. 
Besides for some specific cases of Gaussian networks, we still 



do not know how interference should be treated in general. Let 
us consider two users interfering on each other in addition 
to suffering from Gaussian external noise and say that the 
receivers treat interference as noise. Then, if the first user has 
drawn its code from a Gaussian ensemble, the second user 
faces a frustration phenomenon: using a Gaussian ensemble 
maximizes its mutual information but minimizes the mutual 
information of the first user It is an open problem to find 
the optimal input distributions for this problem. This is one 
illustration of the complications appearing in the network 
setting. Another example is regarding the treatment of the 
fading. Over a single-user AWGN channel, whether the fading 
is deterministic or random, but known at the receiver, does 
not affect the optimal input distribution. From ([T]i, it is clear 
that maximizing I{X;X + Z) or I{X;HX + Z\H) under 
an average power constraint is achieved by a Gaussian input. 
However, the situation is different if we consider a Gaussian 
broadcast channel (BC). When there is a deterministic fading, 
using ([TJ and ([3]l, the optimal input distribution can be shown 
to be Gaussian. However, it has been an open problem to show 
whether Gaussian inputs are optimal or not for a Gaussian BC 
with a random fading known at the receiver, even if the fading 
is such that it is a degraded BC. 

A reason for these open questions in the network in- 
formation theoretic framework, is that Gaussian ensembles 
are roughly the only ensembles that can be analyzed over 
Gaussian networks, as non-Gaussian ensembles have left most 
problems in an intractable form. In this paper, a novel tech- 
nique is developed to analyze a class of non-Gaussian input 
distributions over Gaussian noise channels. This technique is 
efficient to analyze the competitive situations occurring in the 
network problems described below. It allows in particular to 
find certain non-Gaussian ensembles that outperform Gaussian 
ones on a Gaussian BC with coherent fading channel, a 
two user interference channel, and it allows to disprove the 
strong Shamai-Laroia conjecture on the Gaussian intersymbol 
interference channel. This tool provides a new insight on 
Gaussian networks and confirms that non-Gaussian ensembles 
do have a role to play in these networks. We now introduce 
with more details the notion of competitive situations. 

A. Competitive Situations 

1 ) Fading Broadcast Channel: Consider a degraded Gaus- 
sian BC with coherent memoryless fading, where the fading 



is indeed the same for both receivers, i.e. 

Yi = HX + Zi, 
Y2 = HX + Z2 

but Zi ^ Af{0,vi) and Z2 ^ 7V(0, U2), with vi < V2. The 
input X is imposed a power constraint denoted by p. Because 
the fading is coherent, each receiver also knows the reaHzation 
of H, at each channel use. The fading and the noises are 
memoryless (iid) processes. Since this is a degraded broadcast 
channel, the capacity region is given by all rate pairs 

iI{X;Yi\U,H),IiU;Y2\H)) 

with U — X — {Yi, Y2). The optimal input distributions, i.e., the 
distributions of ([/, X) achieving the capacity region boundary, 
are given by the following optimization, where e M, 

arg max I{X;YAU, H) + nI{U]Y2\H). (4) 

{U,xy.U-X-{Yi.Y2) 

Note that the objective function in the above maximization is 
given by 

h{Yi\U,H) - h{Zi) + ^ih{Y2\H) - ^h{Y2\U,H). 

Now, each term in this expression is individually maximized 
by a Gaussian distribution for U and X, but these terms 
are combined with different signs, so there is a competitive 
situation and the maximizer is not obvious. When n < 1, 
one can show that Gaussian distributions are optimal. Also, 
if H is compactly supported, and if v is small enough as 
to make the support of H and 1/vH non overlapping, the 
optimal distribution of {U,X) is jointly Gaussian (cf. |fT3]| ). 
However, in general the optimal distribution is unknown. We 
do not know if it because we need more theorems, or if it 
is really that with fading, non-Gaussian codes can actually 
perform better than the Gaussian ones. 

2) Interference Channel: We consider the symmetric mem- 
oryless interference channel (IC) with two users and white 
Gaussian noise. The average power is denoted by p, the 
interference coefficients by a, and the respective noise by 
Zi and Z2 (independent standard Gaussian). We define the 
following expression 

mSaAXT.XlT) (5) 
/(XJ"; + 0X3" + ZJ") + + aX™ + Z'^) 

= + aXlJ" + Z™) - h{aX^ + Z™) 

+ /i(X2" + aXi" + Z2™) - /i(aXr + Z^), 

where X™ and are independent random vectors of dimen- 
sion m with a covariance having a trace bounded by mp and 
Z™, i = 1,2, are iid standard Gaussian. For any dimension 
m and any distributions of and Xip, S'a,p(X{", X^) is 
a lower bound to the sum-capacity. Moreover, it is tight by 
taking m arbitrarily large and X™ and maximizing (|5]l. 
Now, a similar competitive situation as for the fading broadcast 
problem takes place: Gaussian distributions maximize each 
entropy term, but these terms are combined with different 



signs. Would we then prefer to take Xi and X2 Gaussian 
or not? This should depend on the value of a. If a = 0, we 
have two parallel AWGN channels with no interference, and 
Gaussian inputs are optimal. We can then expect that this 
might still hold for small values of a. It has been proved 
recently in ||2l, fS), iflOJ , that the sum-capacity is achieved 
by treating interference as noise and with iid Gaussian inputs, 
as long as pa^ + a— 1/2 < 0. Hence, in this regime, the iid 
Gaussian distribution maximizes (|5| for any m. But if a is 
above that threshold and below 1, the problem is open. 

Let us now review the notion of "treating interference as 
noise". For each user, we say that the decoder is treating 
interference as noise, if it does not require the knowledge of 
the other user's code book. However, we allow such decoders 
to have the knowledge of the distribution, under which the 
other user's code book may be drawn. This is for example 
necessary to construct a sum-capacity achieving cod^in 
L8J, LIP I, where the decoder of each user treats interference 
as noise but uses the fact that the other user's code book is 
drawn from an iid Gaussian distribution. But, if we allow 
this distribution to be of arbitrarily large dimension m in 
our definition of treating interference as noise, we can get a 
misleading definition. Indeed, no matter what a is, if we take 
m large enough and a distribution of X™, X™ maximizing (|5]l, 
we can achieve rates arbitrarily close to the sum-capacity, yet, 
formally treating interference as noise. The problem is that the 
maximizing distributions in (|5]l may not be iid for an arbitrary 
a, and knowing it at the receiver can be as much information 
as knowing the other user's code book (for example, if the 
distribution is the uniform distribution over a code book of 
small error probability). Hence, one has to be careful when 
taking m large. In this paper, we will only work with situations 
that are not ambiguous with respect to our definition of treating 
interference as noise. It is indeed an interesting problem 
to discuss what kind of m-dimensional distributions would 
capture the meaning of treating interference as noise that we 
want. This also points out that studying the maximizers of (|5]) 
relates to studying the concept of treating interference as noise 
or information. Since for any chosen distributions of the inputs 
we can achieve (|5]l, the maximizers of (|5]l must have a different 
structure when a grows. For a small enough, iid Gaussian are 
maximizing distributions, but for a > 1, since we do not want 
to treat interference as noise, the maximizing distributions 
must have a "heavy structure", whose characterization requires 
as much information as giving the entire code book. This 
underlines that an encoder can be drawn from a distribution 
which does not maximize (|5]l for any value of to, but yet, a 
decoder may exist in order to have a capacity achieving code. 
This happens if a > 1, iid Gaussian inputs will achieve the 
sum-capacity if the receiver decodes the message of both users 
(one can show that the problem is equivalent to having two 
MAC's). However, if a > 1, the iid Gaussian distribution does 
not maximize Sa,p{Xi,X2) (for the dimension 1, hence for 
arbitrary dimensions). 

'in a low interference regime 



In any cases, if the Gaussian distribution does not maximize 
^ for the dimension 1, it means that iid Gaussian inputs and 
treating interference as noise is not capacity achieving, since a 
code which treats interference as noise and whose encoder is 
drawn from a distribution can be capacity achieving only if the 
encoder is drawn from a distribution maximizing (|5]l. Hence, 
understanding better how to resolve the competitive situation 
of optimizing (|5j is a consequent problem for the interference 
channel. 



B. ISI channel and strong Shamai-Laroia Conjecture 

Conjecture 1: Let h,p,v E M+, X^^ ^ JV{Q,p) and Z ^ 
A/'(0, v) (independent of Xf^). For all X, X\ i.i.d. with mean 
and variance p, we have 



I{X-X 



Z) < I{X-X + hXi +Z). 



(6) 



This conjecture has been brought to our attention by 
Shlomo Shamai (Shitz), who referred to the strong conjecture 
for a slightly more general statement, where an arbitrary 
memory for the interference term is allowed, i.e., where 
X]r=i ^i-^i stands for hXi. The strong conjecture then 
claims that picking all X/s Gaussian gives a minimizer. 
However, we will show that even for the memory one case, 
the conjecture does not hold. The weak conjecture, also 
referred to as the Shamai-Laroia conjecture, corresponds to 
a specific choice of the /i/s, which arises when using an 
MMSE decision feedback equalizer on a Gaussian noise ISI 
channel, cf. ID. This conjecture is investigated in a work in 
progress. 

There are many other examples in network information 
theory where such competitive situations occur. Our goal in 
this paper is to explore the degree of freedom provided by non- 
Gaussian input distributions. We show that the neighborhood 
of Gaussian distributions can be parametrized in a specific 
way, as to simplify greatly the computations arising in com- 
petitive situations. We will be able to precisely quantify how 
much a certain amount of non-Gaussianness, which we will 
characterize by means of the Hermite polynomials, affects or 
helps us in maximizing the competitive entropic functional of 
previously mentioned problems. 

II. Problem Statement 

A. Fading BC 

For the fading BC problem problem described in |I-A1[ 
we want to determine if/when the distribution of {U,X) 
maximizing Q is Gaussian or not. 

B. IC 



For the interference channel problem described in I-A2 



we 

know from Q, HI, flOl that treating interference as noise and 
using iid Gaussian inputs is optimal when pa'^ + a— l/2>0. 
We question when this coding scheme is no longer optimal. 
More generally, we want to analyze the maximizers of (|5]). 

We distinguish the implication of such a threshold in both 
the synchronized and asynchronized users setting, as there will 



be an interesting distinction between these two cases. We recall 
how the synch and asynch settings are defined here. In the 
synch setting, each user of the IC sends their code words of 
a common block length n simultaneously, i.e., at time 1, they 
both send the first component of their code word, at time 2 the 
second component, etc. In the asynch setting, each user is still 
using code words of the same block length n, however, there 
might be a shift between the time at which the first and second 
users start sending their code words. We denote this shift by 
T, and assume w.l.o.g. that < t < ?i. In the totally asynch 
setting, we assume that t is drawn uniformly at random within 
{0, . . . , n}. We may also distinguish the cases where r is not 
known at the transmitter but at the receiver, and when r is 
not known at both. Note that if iid input distributions are used 
to draw the code books, and interference is treated as noise, 
whether the users are synch or asynch is not affecting the rate 
achievabilit}]^ However, if the users want to time-share over 
the channel uses, such as to fully avoid their interference, they 
will need synchronization. 

Definition 1: Time sharing over a block length n (assumed 
to be even) with Gaussian inputs refers to using Xi Gaussian 
with covariance 2P/„/2 and X2 Gaussian with covariance 
2P/^y2' where 1^/2 is a diagonal matrix with n/2 I's and 
O's, and I'^^^ flips the I's and O's on the diagonal. 

C. ISI Channel and Strong Shamai-Laroia Conjecture 

We want to determine whether conjecture [T] holds or not. 

D. General Problem 

Our more general goal is to understand better the problem 
posed by any competitive situations. For this purpose, we 
formulate the following mathematical problem. 

We start by changing the notation and rewrite ([TJ and ([3]) 

as 



arg max h{f -k g^) ^ gp 

f- m2{f)=p 

arg min h{f * g^) - h{f) 

f: 1712 (/)=p 



gp 



(7) 
(8) 



where gp denotes the Gaussian density with zero mean and 
variance p, and the functions / are density functions on M, 
i.e., positive functions integrating to 1, and having a well- 
defined entropy and second moment m2(/) = J^x'^ f{x)dx. 

We consider the local geometry by looking at densities of 
the form 



fe{x) ^ gp{x){l + sL{x)), xe 
where i : M — > M satisfies 

inf L(x) > —00 

L{x)gp{x)dx = 0. 



(9) 

(10) 
(11) 



-hence, jsj with an iid distribution for Xi and X2 can still be defined for 
the totally asynch IC 



With these two constraints on L, is a valid density for 
e sufficiently small. It is a perturbed Gaussian density, in a 
"direction" L. Observe that, 

TOi(/e)=0 iff Mi{L) = J xL{x)gp{x)dx = (12) 

m2{fe)=P iff M2{L) = [ x^L{x)gp{x)dx ^0. (13) 

Jr 

We are now interested in analyzing how these perturbations 
affect the output distributions through an AWGN channel. 
Note that, if the input distribution is a Gaussian gp perturbed 
in the direction L, the output is a Gaussian gp+v perturbed in 
the direction i^pj^hg^ since 

f ^ n _L ^ idpL) ★ gv ^ 

J6*9v =gp+v(^ + £ )• 

9p+v 

Convention: gpL-kgy refers to {gpL)-kgy, i.e., the multiplicative 
operator precedes the convolution one. 

For simplicity, let us assume in the following that the function 
i is a polynomial satisfying ([TO]l, ( [TT] i. 
Lemma 1: We have 

D{fe\\gp) = \e^\\L\\l+o{e^) 



Dife*gv\\gp*gv) 

where 

ii^iiL 



^ yp+v 



LF'{x)gp{x)dx. 



Moreover, note that for any density /, if mi{f) 
m2(/) = p + a?, we have 

KI) 



(14) 



h{9a,p) - D{f\\ga,p). 

Hence, the extremal entropic results of (|7]i and (|8]l are locally 
expressed as 



arg mm 

L: M2(L)=0 



arg max 

L: M2(L)=0 



1 9pL ★ gv I 

9p+v 

^ 9pL* gv 
9p+v 



= 



i^iiL 



0, 



(15) 



(16) 



where denotes here the zero function. If ([TSj is obvious, 
( [T6| l requires a proof which will be done in section [V] Let us 
define the following mapping, 

r(+': LeL2{gp)^^-^^^^eL2{gp+v;R), (17) 
9p+v 

where L2{gp) denotes the space of real functions having a 
norm. This linear mapping gives, for a given 



9p 



finite 

perturbed direction L of a Gaussian input gp, the resulting 
perturbed direction of the output through additive Gaussian 
noise gy. The norm of each direction in their respective spaces, 
i.e., in L2{gp) and L2{gp+v), gives how far from the Gaussian 
distribution these perturbations are (up to a scaling factor). 
Note that if L satisfies ([TO])-([n]), so does T^+'L for the 
measure gp+v The result in ( [T6| l (worst noise case) tells us 



that this mapping is a contraction, but for our goal, what would 
be helpful is a spectral analysis of this operator, to allow more 
quantitative results than the extreme-case results of ( fTS) and 

In order to do so, one can express r(+'' as an operator 
defined and valued in the same space, namely L2 with the 
Lebesgue measure A, which is done by inserting the Gaussian 
measure in the operator argument. We then proceed to a 
singular function/value analysis. Formally, let K = L^Jg^, 
which gives ||A'||a = \\L\\g ^ and let 



A 



r y/9pK-kgy 

K e L2(A) ^ e L2{\) 



(18) 



\/9p+v 

which gives Hr^+^Lljg^^^ = ||AA'||a. Denoting by A* the 
adjoint operator of A, we want to find the singular functions 
of A, i.e., the eigenfunctions K of A* A: 

A* A a: = 7 a:. 

III. Results 

A. General Result: Local Geometry and Hermite Coordinates 

The following theorem gives the singular functions and 
values of the operator A defined in previous section. 



Theorem 1: 



A*AK = jK, K ^0 



holds for each pair 



iK,^)&{W9pH, 



[p] 

k ' 



P 



p + V 



)} 



fe>o, 



a and where 



H^^\x) = ^mix/^) 

Hu{x)^{-lfe-'/^^e--''\ k>0,xe 



The polynomials Hj!^' are the normalized Hermite polynomials 

[p] 



(for a Gaussian distribution having variance p) and ^Jg^H^^ 



are called the Hermite functions. For any p > 0, {^f|?''}fc>o 
is an orthonormal basis of L2{gp), this can be found in |12|. 
One can check that Hi, respectively H2 perturb a Gaussian 
distribution into another Gaussian distribution, with a different 
first moment, respectively second moment. For fc > 3, the Hk 
perturbations are not modifying the first two moments and are 
moving away from Gaussian distributions. Since H\f^ ~ 1, 
the orthogonality property implies that satisfies ( [TT| for 
any A: > 0. However, it is formally only for even values of k 
that ( [T3| ) is verified (although we will see in section [V] that 
essentially any k can be considered in our problems). The 
following result contains the property of Hermite polynomials 
mostly used in our problems, and expresses Theorem [T] with 
the Gaussian measures. 

The following result contains the property of Hermite 
polynomials mostly used in our problems, and expresses 
Proposition [T] with the Gaussian measures. 



Theorem 2: 



*gv 



p 

p + V 

p 



k/2 



[p+v] 



p 



k/2 



H 



Ip] 



(19) 
(20) 



Last Theorem implies Theorem [T| since 

r(-)r(+)L = 7L A*AK = -/K 

for 

K = L^/g^. 

Comment: the results that we have just derived are related 
to properties of the Ornstein-Uhlenheck process. 

Summary: In words, we just saw that Hk is an eigenfunction 
of the input/output perturbation operator r(+\ in the sense 

that r(+)ij[^i = 



Hence, over an additive 

Gaussian noise channel g„, if we perturb the input in the 

\p] 

direction iJ^" by an amount e, we will perturb the output 

gp+v in the direction H]^ " by an amount ( j e. Such 
a perturbation in Hk implies that the output entropy is reduced 

(compared to not perturbing) by \ (if fc > 3). 



B. Fading BC Result 

The following result states that the capacity region of a 
degraded fading BC with Gaussian noise is not achieved by a 
Gaussian superposition code in general. 

Theorem 3: Let 



Y2 



HX + Z2 



with X such that EX^ < p, Zi - 7V(0, v), Q < v < I, Z2 ^ 
Af{0, 1) and H, X, Zi, Z2 mutually independent. There exists 
a fading distribution and a value of v for which the capacity 
achieving input distribution is non-Gaussian. More precisely, 
let U be any auxiliary random variable, with U — X — {Yi,Y2). 
Then, there exists p, v, a distribution of H and such that 

[U, X) ^ I{X- Yi\U, H) + nI{U; Y2\H) (21) 

is maximized by a non jointly Gaussian distribution. 

In the proof, we present a counter-example to Gaussian 
being optimal for H binary. In order to defeat Gaussian dis- 
tributions, we construct input distributions using the Hermite 
coordinates. The proof also gives a condition on the fading 
distribution and the noise variance v for which a non-Gaussian 
distribution strictly improves on the Gaussian one. 



C. IC Result 

Definition 2: Let 

Fk{a,p) - lim lim ^ [^a,p(^i,X2) - 5a,p(Xf , ^2^)] 

where X^,X^ ~_5p' ^1 5p(1 + ^Hk) and X2 
5p(l ~ ^M.k)7 with Hk defined in ( p2] i below (as explain in 



section 



IV 



Hk is a formal modification of Hk to ensure the 
positivity of the perturbed densities). 

In other words, Fk{a,p) represents the gain (positive or 
negative) of using Xi perturbed along Hk and X2 perturbed 
along —Hk with respect to using Gaussian distributions. Note 
that the distributions we chose for Xi and X2 are not the most 
general ones, as we could have chosen arbitrary directions 
spanned by the Hermite basis to perturb the Gaussian densities. 
However, as explained in the proof of the theorem [4] this 
choice is sufficient for our purpose. 
Theorem 4: We have for k > 2 



Fk{a,p) 



pa 



1 



{pa? +P+IY 



For any fixed p, the function Fk{-,p) has a unique positive 
root, below which it is negative and above which it is positive. 

Theorem 5: Treating interference as noise with iid Gaussian 
inputs does not achieve the sum-capacity of the symmetric IC 
(synch or asynch) and is outperformed by Xi ^ gp{l + eH^) 
and X2 - .gp(l - e^s), if F3{a,p) > 0. 
This Theorem is a direct consequence of Theorem |4] 

Proposition 1: For the symmetric synch IC, time sharing 
improves on treating interference as noise with iid Gaussian 
distribution if F2{a,p) > 0. 

We now introduce the following definition. 

Definition 3: Blind time sharing over a block length n 
(assumed to be even) between two users, refers to sending 
non-zero power symbols only at the instances marked with 
a 1 in (1, 0, 1, 0, 1, 0, . . . 1, 0) for the first user, and zero 
power symbols only at the instances marked with a 1 in 
(1, 1, . . . , 1, 0, 0, . . . , 0) for the second user 

Proposition 2: For the symmetric totally asynch IC, if the 
receivers (but not transmitters) know the asynchronization 
delay, blind time sharing improves on treating interference as 
noise with iid Gaussian distributions if B2{a,p) > 0, where 
B2ia,p) = l(log(l + 2p)+log(l+j^))-log(l + ^). 
If the receivers do not know the asynchronization delay, blind 
time sharing cannot improve on treating interference as noise 
with iid Gaussian distributions if i?2(a,p) < 0. 

How to read these results: We have four thresholds to keep 
track of: 

• Ti{p) is when pa^ + a— ^=0. Ifa< Ti{p), we know 
from 121, El, lITOl that iid Gaussian inputs and treating 
interference as noise is sum-capacity achieving. 

• T2{p) is when F2{a,p) = 0. If a > 72 (p), we know 
from Prop. [T] that, if synchronization is permitted, time 
sharing improves on treating interference as noise with iid 



Gaussian inputs. This regime matches with the so-called 
moderate regime defined in (5). 

• Ts{p) is when F3{a,p) = 0. If a > T'3(p), we know from 
Prop. |5] that treating interference as noise with iid non- 
Gaussian distributions (opposites in H3) improves on the 
iid Gaussian ones. 

• 24 (p) is when B2{a,p) = 0. If a > T/^ip), we know from 
Prop. |2] that, even if the users are totally asynchronized, 
but if the receivers know the asynchronization delay, blind 
time sharing improves on treating interference as noise 
with iid Gaussian inputs. If the receivers do not know 
the delay, the threshold can only appear for larger values 
of a. 

The question is now, how are these thresholds ranked. It turns 
out that < Tiip) < T2{p) < nip) < Ti{p). And if p = 
1, the above inequahty reads as 0.424 < 0.605 < 0.680 < 
1.031. This implies the following for a decoder that treats 
interference as noise. Since 12 (p) < T^{p), it is first better 
to time share than using non-Gaussian distributions along ifs. 
But this is useful only if time-sharing is permitted, i.e., for the 
synch IC. However, for the asynchronized IC, since Tj,{p) < 
T4{p), we are better off using the non-Gaussian distributions 
along H3 before a Gaussian input scheme, even with blind 
time-sharing, and even if the receiver could know the delay. 
We notice that there is still a gap between Ti{p) and 12 (p), 
and we cannot say if, in this range, iid Gaussian inputs are 
still optimal, or if another class of non-Gaussian inputs (far 
away from Gaussians) can outperform them. In ||4], another 
technique (which is related to ours but not equivalent) is used 
to find regimes where non-Gaussian inputs can improve on 
Gaussian ones on the same problem that we consider here. 
The threshold found in f4l is equal to 0.925 for p = 1, which 
is looser than the value of 0.680 found here. 

Finally, the following interesting and curious fact has also 
been noticed. In theorem |4] we require k > 2. Nevertheless, 
if we plug fc = 1 in the right hand side of theorem |4] 
and ask for this expressions to be positive, we precisely get 



the orthogonality property of the Hermite basis and since 
i/p^' = 1, we conclude that H^^^ satisfies ( [TT| for any fc > 0. 
However, it is only for fc even that i?]?'' satisfies (flOl. On the 



rb] 



SHj^j} satisfies 



other hand, for any (5 > 0, we have that i?^, 
( fTO] ), whether fc is even or not (we chose 4fc instead of 2fc for 
reasons that will become clear later). Now, if we consider the 
direction ~H^\ ( [TO] i is not satisfied for both fc even and odd. 



[p] 



(5i?4^' satisfies 



But again, for any (5 > 0, we have that — i?^, 
( fTO] ). Hence, in order to ensure ( [TO] i, we will often work in the 
proofs with ±ij[^'+(5i?]^', although it will essentially allow us 
to reach the performance achieved by any ±if|f ' (odd or even), 
since we will then take 6 arbitrarily small and use continuity 
arguments. 

Convention: We drop the variance upper script in the 
Hermite terms whenever a Gaussian density with specified 
variance is perturbed, i.e., the density gp{l + eHk) always 
denotes 17^(1 +eij[?''), and gpH]^ always denotes gpH^\ no 
matter what p is. Same treatment is done for \\ ■ \\g^ and || • ||. 

Now, in order to evaluate the entropy of a perturbation, 
i.e., h{gp{l + eL)), we can express it as the entropy of h{gp) 
minus the divergence gap, as in ([14]), and then use Lemma [T| 
for the approximation. But this is correct if gp{l + eL) has 
the same first two moments as gp. Hence, if L contains only 
Hk's with fc > 3, the previous argument can be used. But if 
L contains Hi and/or H2 terms, the situation can be different. 
Next Lemma describes this. 

Lemma 2: Let S,p > and 



bH 



[p] 



m 

b{H, 



if & > 0, 
if 6 < 0. 



(22) 



We have for any afe e M, fc > 1, e > 
h{gp{l + e^akHk)) = 

k>l 

a.kHk) 

k>l 



\9p) + 



eoi2 



i > 0, i.e., the complement range delimited by Finally, when we convolve two perturbed Gaussian distribu- 



pa^ + ( 

Ti (p) . However, the right hand side of theorem [4] for fc = 1 
is not equal to Fi{a,p) (this is explained in more details in 
the proof of theorem |4]). Indeed, it would not make sense 
that moving along Hi, which changes the mean with a fixed 
second moment within Gaussians, would allow us to improve 
on the iid Gaussian scheme. Yet, getting to the exact same 
condition, when working on the problem of improving on the 
iid Gaussian scheme, seems to be a strange coincidence. 

D. Strong Shamai-Laroia Conjecture 

We show in Section |V] that conjecture [T] does not hold. We 
provide counter-examples to the conjecture, pointing out that 
the range of h for which the conjecture does not hold increases 
with SNR. 



tions, we get ga{l + eHj) -k gf,{l + sHk) = ga+b + ^[gaHj * 
.9fc + 5a * gbHk] + e^gaHj ★ gbHk- We already know from 
Theorem |2] how to describe the terms in e, what we still need 
is to describe the term in e^. We have the following. 
Lemma 3: We have 



k+l 



-b] 



where C is a constant depending only on a, b, k and I. In 
particular if fc = ^ = 1, we have C — 



IV. Hermite Coding: Formalities 



r[p] 



The Hermite polynomial corresponding to fc = is Hq 
1 and is clearly not a valid direction as it violates ( fTT) . Using 



a+b 



V. Proofs 

We start by reviewing the proof of ( [T6| ), as it brings 
interesting facts. We then prove the main result. 
Proof of ([T6|.- 

We first assume that has zero mean and variance p. Using 
the Hermite basis, we express L as L = Sfe>3 '^kH^^^ (L must 
have such an expansion, since it must have a finite £2(5p) 



norm, to make sense of the original expressions). Using ([T9|, 
we can then express ([T6| as 



p 



p + v 



(23) 



/c>3 ^ fc>3 

which is clearly negative. Hence, we have proved that 

ll^f^llL„<IWIl 
yp+v 

and ( [T&l is maximized by taking L = 0. Note that we can get 
tighter bounds than the one in previous inequality, indeed the 
tightest, holding for H-^, is given by 



(24) 



< 



9p 



(25) 



(this clearly holds if written as a series like in (|23]l). Hence, 
locally the contraction property can be tightened, and locally, 
we have stronger EPFs, or worst noise case. Namely, if i/ > 



\p+v J 



we have 



arg min h{f g^) - vh{f) ^ (26) 

/:mi(/)=0,m2(/)=p 

and if < |^^;^y, gp is outperformed by non-Gaussian 
distributions. Now, if we consider the constraint TO2(/) < p, 
which, in particular, allows to consider mi{f) > and 
TO2(/) = P, we get that if > 

arg min h{f -k g^) - i^h{f) = gp (27) 

/: m2(S)<p 

and if J/ < 5^ is outperformed by gp^s for some (5 > 0. 
It would then be interesting to study if these tighter results 
hold in a greater generality than for the local setting. 

Proof of Theorem [2j 
We want to show 



[p+v] 



gpJ^k 3v - [^^-^) gp+vJ^k 

which is proved by an induction on k, using the following 
properties (Appell sequence and recurrence relation) of Her- 
mite polynomials: 



d_ 
dx 



1 



^[gpix)H^\x) 



Proof of Theorem [ij 
We refer to ( |2T| as the mu-rate. Let us first consider Gaussian 
codes, i.e., when ([/, X) is jointly Gaussian, and see what 
mu-rate they can achieve. Without loss of generality, we can 
assume that X = U + V, with U and V independent and 



Gaussian, with respective variance Q and R satisfying P 
Q + R. Then, ( [2T] i becomes 



, RH^, 

-Elog 1 + , 

2 V 



1 1 + PH^ 
u-E\og— ^. (28) 



Now, we pick a \x and look for the optimal power R that must 
be allocated to V in order to maximize the above expression. 
We are interested in cases for which the optimal R is not at 
the boundary but at an extremum of ( |28] l, and if the maxima 
is unique, the optimal R is found by the first derivative check, 
which gives E ^^^^a = ^E i^jj2 ■ Since we will look for /i, 
V, with R > 0, previous condition can be written as 



E- 



RH' 



= ^E- 



RH' 



(29) 



'v + pm i + pm' 

We now check if we can improve on ( |28] l by moving away 
from the optimal jointly Gaussian ([/, X). There are several 
ways to perturb {U,X), we consider first the following case. 
We keep U and V independent, but perturb them away from 
Gaussian's in the following way: 



PuM) - 5q(")(1 + e{Hf\u) + 6H,)) 



(30) 
(31) 



with e,S > small enough. Note that these are valid density 
functions and that they preserve the first two moments of 
U and V. The reason why we add 6H4, is to ensure that 
( pjj ) is satisfied, but we will see that for our purpose, this 
can essentially be neglected. Then, using Lemma |2j the new 
distribution of X is given by 



Pxix) = gpix){l + e 



rrlP] iR 
^3 \ p 



where f{S) = 5gp{x)e{(^$y + (D^ijf'), which 
tends to zero when 6 tends to zero. Now, by picking P = 2R, 
we have 



Px{x) 



gp{x) + f{d). 



(32) 



Hence, by taking S arbitrarily small, the distribution of X is 
arbitrarily close to the Gaussian distribution with variance P. 
We now want to evaluate how these Hermite perturbations 
perform, given that we want to maximize ( |2T| , i.e., 

h{Yi\U,H) - h{Zi) + fMY2\H) - ^lh{Y2\U,H). (33) 

We wonder if, by moving away from Gaussian distributions, 
the gain achieved for the term —h{Y2\U, H) is higher than the 
loss suffered from the other terms. Using Theorem |2] Lemma 
[Tjand Lemma |2] we are able to precisely measure this and we 
get 



h{Yi\U = u,H ^h) 

= Hghu,v+Rh^i^ - £ 

1 



Rh' 



[hu,v+Rh^ 



)) 



^lo,2Mv + Rh^)- J (^X + ois^) + 0(5) 



h{Y2\U = u,H = h) 
= - log 2^e(l + Rk") - - ,, 



+ o(£2)+o((5) 



and because of (132^ 



1 



/i(y2|i? = /i) = -log27re(l + P/i^) + o(£' 



Therefore, collecting all terms, we find that for and 
defined in ( |30l l and ( |3T| ), expression ( |4T] l reduces to 



+ o{e^) + o{5) 



(34) 



where is equal to ( |28| ) (and is the mu-rate obtained with 
Gaussian inputs). Hence, if for some distribution of H and 
some V, we have that 



RH^ 



1 + i?iJ2 ; 



E 



> 0, 



(35) 



when fc = 3 and R is optimal for ji, we can take e and 
5 small enough in order to make ( |34j l strictly larger than 
/g- We have shown how, if verified, inequality ( (35] l leads 
to counter-examples of the Gaussian optimality, but with 
similar expansions, we would also get counter-examples if the 
following inequality holds for any power k instead of 3, as 
long as fc > 3. Let us summarize what we obtained: Let R be 
optimal for /i, which means that pO] ) holds if there is only one 
maxima (not at the boarder). Then, non-Gaussian codes along 
Hermite's strictly outperforms Gaussian codes, if, for some 
fc > 3, ([35]) holds. If the maxima is unique, this becomes 



ET(w)'^ ET{v) 
Er(l)fe ^ ET(1) 



where 



RH^ 
V + Rm ■ 



So we want the Jensen gap of T{v) for the power k to be 
small enough compared to the Jensen gap of T(l). 

We now give an example of a fading distribution for which 
the above conditions can be verified. Let H be binary, taking 
values 1 and 10 with probability half and let u = 1/4. Let 
ji = 5/4, then for any values of P, the maximizer of ( |28| ) is 
at i? = 0.62043154, cf. Figure [T| which corresponds in this 
case to the unique value of R for which (|29]) is satisfied. Hence 
if P is larger than this value of R, there is a corresponding 
fading BC for which the best Gaussian code splits the power 
on U and V with R = 0.62043154 to achieve the best mu- 
rate with /i = 5/4. To fit the counter-examples with the choice 
of Hermite perturbations made previously, we pick P — 2R. 
Finally, for these values of /i and R, ( [35] ) can be verified 
for fc = 8, cf Figure |2] and the corresponding Hermite code 
(along iJg) strictly outperforms any Gaussian codes. 

Note that we can consider other non-Gaussian encoders, 
such as when U and V are independent with U Gaussian and 




Fig. 1. Gaussian mu-rate, i.e., expression (28}, plotted as a function of R 
for fj. = V = P = 1.24086308 and H binary {1; 10}. Maxima at 
R = 0.62043154. 




Fig. 2. LHS of (35) as a function of R, for = 5/4, v = 1/4, k ■■ 
H binary {1; 10}, positive at R = 0.62043154. 



and 



V non-Gaussian along Hermite's. Then, we get the following 
condition. If for fc > 3 and R optimal for /i, we have 



E 



RH' 



k 



< M 



V + RH^ J 

RH^ 



E 



1 + RH^ 



E 



PH^ Y 

1 + Pi?2 J 



(36) 



(37) 



then Gaussian encoders are not optimal. Notice that previous 
inequality is stronger than the one in ( (35] l for fixed values of 
the parameters. Yet, it can still be verified for valid values of 
the parameters and there are also codes with U Gaussian and 
V non-Gaussian that outperform Gaussian codes for some 
degraded fading BCs. 



Proof of Theorem ^ 
Let £, i5 > and let Xi and X2 be respectively distributed 

as gp{l + £[Hk + 5H4^k\) and gp{l - e[Hk - SH^k]), where 
7^ 1,2. We have 

I{XuXi + aX2 + Zi) = h{Xi + aX2 + Zi) - h{aX2 + Zi) 

where X^ are independent Gaussian 0-mean and p-variance 
random variables. Hence, we need to evaluate the contribution 
of each divergence appearing in previous expression, in order 
to know if the perturbations are improving on the Gaussian 
distributions. Let us first analyze h{Xi + aX2 + Zi). The 
density of Xi + aX2 + Z\ is given by 



Hence 



+ 0X2 + ^1) = h(X'{ + aX^ + Zi) 



k\2 



Similarly, we get 

D{aX2^Zi\\aX^ 
and 



9 



2 



gp{l+e[Hk + SH4k])*9a-p{l-e[Hk-6Hik])*9i, (38) , ^ 

2 

which, from Theorem |2] is equal to 

3p+a2p+l(l + 



a^p + 1 



fe\2 



aX2« 
P 



-o{5). 



^1) 

k 



p + arp + 1 



a?p 



where 



P- 
eL}) 

L = 



a?p 



Hk + S 



Hk-S 



a^p 



2 

a^p 



2k 



ik 



Finally, we have 

I{X2,X2 + aXi 
and 



Z2)=IiXi,Xi+aX2 + Zi) 



a?p 



2k 



Hik 



I{Xi,Xi+aX2 + Zi) 
= IiX^,X^ + aX^ ^ 



2 

d^p + 1 



(1 



-I{X2,X2 + aXi 
Z,)+I{X^,X^^ 



-Z2) 



P 



ci?p 



gp[Hk + 5Hik\ *ga2p[Hk - 5H4^k] *gi 



Note that each direction in each line of the bracket {•} above, 
including L, satisfy ( [TO| i and ( fTT) . Using Lemma |3] we have 



Hence, if for some fc 7^ 3 we have 
1 



a^p 



(a* - 1)2 
{p + a^p + 1)'^ 



1 



> 



Z2) 
(40) 



L = 



k 



2k 



6H. 



ik 



(39) 



we can improve on the iid Gaussian distributions gp by using 
the respective Hermite perturbations. 

Now, we could have started with Xi and X2 distributed as 
5p(l + ehkHk) and gp{l + ecj^H^), where is defined in 
(p2|). With similar expansions, we would then get that we can 



where Ci, C2, C3 are constants. Therefore, the density of Xi + 
aX2 + Zi is a Gaussian gp^^'^p^i perturbed along the direction 
Hk in the order e and several Hi with Z > 2A: in the order 
(and other directions but that have a 6 order). So we can use 
Lemma |2] and write 



improve on the Gaussian distributions if for some some bk, Ck 
and fc 7^ 1 , 2 we have 

k 



a?p 



a^p 



1 



{o^pY 



o?p 



1 



{hi 



-cl) 



h{Xi + aX2 + Zi) = h{Xf + aX. 
-D{Xi+ aX2 + Zi\\X^ + aX^ 

Using Lemma [T] we have 

D{Xi + aX2 + Zi\\Xf + aXf + Zi) 
P 



G 



f ^1) 

Zi) 



ap 



bkCk > 0. 



p + a?p + 1 



Hk 



P 



a?p 



2k 



Hik 



2 

a p 



a^p - 



k\2 



Hk 



P 



2 

a^p 



p + a?p + 1 



aP'p 



2k 



H 



ik 



MS)- 



^p + a^p + 1 ^ 
But the quadratic function 

(5,c) e ^ 7(6^ +c2) - 26bc, 

with 6 > 0, can be made positive if and only if 7 + ^ > 0, and 
is made so by taking bk = —Ck- Hence, the initial choice we 
made about Xi and X2 is optimal. Moreover, note that for 
this distribution of Xi and X2, we could have actually chosen 
fc = 2 as well. Because, even if Lemma [2] tells us that we 
must use correction terms, these correction terms will cancel 
out when we consider the sum-rate, since bk = —Ck and since 
the correction is in e. There is however another problem when 
using k = 2, which is that gp{l + ei?2) has a larger second 
moment than p. However, if we use a scheme of block length 



2, we can compensate this excess on the first channel use 
with the second channel use, and because of the symmetry, 
we can achieve the desired rate. But this is allowed only 
with synchronization. We could also have used perturbations 
that are mixtures of Hermite's, such as gp{l + eJ^k^^Hk)- 
We would then get mixtures of previous equations as our 
condition. But in the current problem this will not be helpful. 
Finally, perturbing iid Gaussian inputs in a independent but 
non i.d. way, i.e., to perturb different components in different 
Hermite directions, cannot improve on our scheme, from 
previous arguments. The only option which is not investigated 
here (but in a work in progress), is to perturb iid Gaussian 
inputs in a non independent manner Finally, if we work with 
k = 1, the proof sees the following modification. In ([39|, we 
now have a term in H2- However, even if this term is in the 
order e^, we can no longer neglect it, since from Lemma 
2 a e^H2 term in the direction comes out as a ^ term in 
me entropy. Hence, we do not get the above condition for 
k = 1, but the one obtained by replacing {a'' — 1)^ with 
(a^ + 1), and the condition for positivity can never be fulfilled. 

Proof of Proposition [7J 
From Theorem |4j we know that when treating interference 
as noise and when F2{a,p) > 0, it is better to use encoders 
drawn from the 2 dimensional distributions Xf and X|, 
where ^ jp{l + sH^), [X^)! -^.9p(l - £^2), 

{Xi)2 ^ gp{l - eH2) and (^2)2 ^ 5p(1 + £-^^2), as opposed 
to using Gaussian distributions. But perturbations in H2 
are changing the second moment of the input distribution. 
Hence, this scheme is mimicing a time-sharing in our 
local setting. Moreover, a direct computation also allows to 
show that, constraining each user to use Gaussian inputs of 
arbitrarily block length n, with arbitrary covariances having 
a trace bounded by nP, the optimal covariances are pin if 
P2{0',p) < 0, and otherwise, are given by a time-sharing 
scheme (cf. Definition [T] for the definition of a Gaussian 
time-sharing scheme and covariance matrices). 

Proof of Proposition ^ 
Note that when using blind time-sharing, no matter what the 
delay in the asynchronization of each user is, the users are 
interfering in n/4 channel uses and have each a non-intefering 
channel in n/A channel uses (the rest of the n/4 channel 
uses are not used by any users). Hence, if the receiver have 
the knowledge of the asynchronization delay, the following 
sum-rate can be achieved: |;(log(l + 2p) + log(l + j^^^))- 
And if the delay is unknown to the receivers, the previous 
sum-rate can surely not be improved on. 

Disproof of Conjecture [7J 
This proof uses similar steps as previous proofs. Using Lemma 
( [T4j l, we express 



I{X-X + hXf + Z) <I{X;X + hXi+Z). 



- D{X + hXf + Z\\X'^ + hXf + Z) 

< -D{X + hXi + Z\ \X^ + hXf + Z) 

+ D{hXi + Z\\hX^ + Z). (41) 

We then pick X, Xi ^ gp{l + eHk) and assume that k is even 
for now. We then have 



X + hX'{ + Z 



+ £Hk) * gph^+v 

^ p + pK^ + V 



and 



D{X + hX'{ + + hX'{ + Z) 

2^p + ph'^ +v' 

Similarly, 

X + hXi+Z 

~ + £Hk) * gph^ (1 + Hk) ★ gy 



gp{l + eHk)*gph2+v{^ + ( 



ph^ 



f'^Hk) 



and 



ph'^ + V 

gp^p,.Ul + ei^^JI^2+/^"Hk 
p + p/i^ -|- V 



D{X + hXi + Z\ \X'^ + hX^- + Z) 



Finally, 



2^ p + pK^ + V ' 



hXi + Z gph2 (1 + eHk) * 5„ 
ph"^ 



ph? + V 



'/'Hk) 



and 



DihX,+Z\\hX- + Z) = -i-^^r, 
Therefore, ( |4T] l is given by 



P 

p + ph^ + V 



< - 



and if 



P 



p + pK^ + V 



P 

p + pK^ + V 

ph^ 
ph^ + V 



(1 + h'^f + 



f pK^ 



\ph-^ + V 



0(1) 



k\2 



> 



p + ph^ + V 



(42) 
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Fig. 3. 



for some k even and greater than 4, we have a counter example 
to the strong conjecture. Note that, using the same trick as in 
previous proofs, that is, perturbing along Hk instead of H^, we 
get that if ( |42] i holds for any /c > 3, we have a counter example 
to the strong conjecture. Defining u :— v/p — 1/SNR, ( |42] i is 
equivalent to 



G(/i, u, k) := 



1 



> 0. (43) 



As shown in Figure |3] this can indeed happen. An interesting 
observation is that the range where ( |43] l holds is broader when 
u is larger, i.e., when SNR is smaller. Indeed, when u — v ^ Q, 
which corresponds to dropping the additive noise Z, we do not 
get a counter-example to the conjecture. But in the presence 
of Gaussian noise, the conjecture does not hold for some 
distributions of X,Xi. The conjecture had been numerically 
checked with binary inputs at low SNR, and in this regime, 
it could not be disproved. With the hint described above, we 
checked numerically the conjecture with binary inputs at high 
SNR, and there we found counter-examples. 

VI. Discussion 

We have developed a technique to analyze codes drawn 
from non-Gaussian ensembles using the Hermite polynomials. 
If the performance of non-Gaussian inputs is usually hard 
to analyze, we showed how with this tool, it reduces to 
the analysis of analytic power series. This allowed us to 
show that Gaussian inputs are in general not optimal for 
degraded fading Gaussian BC, although they might still be 
optimal for many fading distributions. For the IC problem, 
we found that in the asynchronous setting and when treating 
interference as noise, using non-Gaussian code ensembles (H^ 
perturbations) can strictly improve on using Gaussian ones. 



when the interference coefficient is above a given threshold, 
which significantly improves on the existing threshold (cf. (^I). 
We have also recovered the threshold of the moderate regime 
by using H2 perturbations in the synch setting, showing that 
this global threshold is reflected in our local setting. We also 
met mysteriously in our local setting the other global threshold 
found in ||2l, El, ifTOl . below which treating interference as 
noise with iid Gaussian inputs is optimal. It is worth noting 
that this two global thresholds (moderate regime and noisy 
interference) are recovered with our tool from a common 
analytic function. We hope to understand this better with a 
work in progress. 

The Hermite technique provides not only counter-examples 
to the optimality of Gaussian inputs but it also gives insight on 
the competitive situations in Gaussian network problems. For 
example, in the fading BC problem, the Hermite technique 
gives a condition on what kind of fading distributions and 
degradedness (values of v) non-Gaussian inputs must be used. 
It also points out that the perturbation in are most effective 
when carried in an opposite manner for the two users, so as 
to make the distribution of X close to Gaussian. 

Finally, in a different context, local results could be "lifted" 
to corresponding global results in |T|. There, the localization 
is made with respect to the channels and not the input distri- 
bution, yet, it would be interesting to compare the local with 
the global behavior for the current problem too. The fact that 
we have observed some global results locally, as mentioned 
previously, gives hope for possible local to global extensions. 
A work in progress aims to use our tool beyond the local set- 
ting, in particular, by analyzing all sub-Gaussian distributions. 
Moreover, there are interesting connections between the results 
developed in this paper and the properties of the Omstein- 
Uhlenbeck process. Indeed, some of these properties have 
already been used in |3| to solve the long standing entropy 
monotonicity conjecture, and we are currently investigating 
these relations from closer. 
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